Computational Identification of Noncoding RNA Genes through Phylogenetic Shadowing

نویسندگان

  • Kushal Chakrabarti
  • Daniel L. Ong
چکیده

Although fairly accurate databases exist for protein-coding genes, little is known about another important class of genes known as noncoding RNA genes. These genes, which have been implicated in a wide variety of critical biochemical pathways including brain development [7] and viral defense [5], are not translated into polypeptides. Instead, their transcribed RNAs fold into stable, base-paired secondary and tertiary structures that confer catalytic ability. For the purposes of this paper, it is especially important to note that these secondary structures cause noncoding RNA genes to contain pseudo-palindromic sequences. Unfortunately, these pseudo-palindromic and other signals are not statistically sufficient for the computational identification of such genes [9]. Because they are difficult to detect even through biological techniques, it is important that accurate computational approaches be developed [10]. Although many heuristic and specialized methods have been suggested, comparative genomics approaches [8] have shown particular promise. However, even these approaches are primitive. For instance, current comparative genomics approaches are limited to two sequences, despite recent work showing the importance of using several related species [2]. Other problems include various heuristic approximations and poor scaling. Here, we briefly present a machine learning approach to genome-wide noncoding RNA gene prediction with multiple sequence alignments. The algorithm we describe scales linearly with respect to the length and number of genomes. More importantly, the approach is statistically sound, and allows the direct computation of probabilities through modular protein-coding, noncoding RNA, and intergenic sequence models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Identification of Noncoding RNA from Transcriptomes Requires Phylogenetically-Informed Sampling

Noncoding RNAs are integral to a wide range of biological processes, including translation, gene regulation, host-pathogen interactions and environmental sensing. While genomics is now a mature field, our capacity to identify noncoding RNA elements in bacterial and archaeal genomes is hampered by the difficulty of de novo identification. The emergence of new technologies for characterizing tran...

متن کامل

A comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences

More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...

متن کامل

Computational Genomics of Noncoding RNA Genes

The number of known noncoding RNA genes is expanding rapidly. Computational analysis of genome sequences, which has been revolutionary for protein gene analysis, should also be able to address questions of the number and diversity of noncoding RNA genes. However, noncoding RNAs present computational genomics with a new set of challenges.

متن کامل

Study of Long Noncoding RNA FER1L4 and RB1, as Its Competing Endogenous RNA Network Target Gene, in Breast Cancer

Introduction: Breast cancer is the second most common cause of cancer-related death among females, which requires an exploration for markers to propose a more specific categorization of this cancer. Long noncoding RNAs (lncRNAs), the main subset of noncoding transcripts, are involved in tumorigenic processes. In this study, we investigated the expression of the fer-­1–­like family member 4 (FER...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004